Sentence Selection by Direct Likelihood Maximization for Language Model Adaptation

نویسندگان

  • Takahiro Shinozaki
  • Yu Kubota
  • Sadaoki Furui
  • Eiji Utsunomiya
  • Yasutaka Shindoh
چکیده

A general framework of language model task adaptation is to select documents in a large training set based on a language model estimated on a development data. However, this strategy has a deficiency that the selected documents are biased to the most frequent patterns in the development data. To address this problem, a new task adaptation method is proposed that selects documents in the training set so as to directly reduce the perplexity on the development set. Moreover, a weighting method to modify the perplexity objective function is proposed to improve the generalization to unseen data. The proposed adaptation methods are evaluated by large vocabulary speech recognition experiments. It is shown that the proposed adaptation with the weighting term produces a compact-size model that gives consistently lower word error rates for different tasks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Joint and Coupled Bilingual Topic Model Based Sentence Representations for Language Model Adaptation

This paper is concerned with data selection for adapting language model (LM) in statistical machine translation (SMT), and aims to find the LM training sentences that are topic similar to the translation task. Although the traditional approaches have gained significant performance, they ignore the topic information and the distribution information of words when selecting similar training senten...

متن کامل

Discounted likelihood linear regression for rapid speaker adaptation

The widely used maximum likelihood linear regression speaker adaptation procedure suffers from overtraining when used for rapid adaptation tasks in which the amount of adaptation data is severely limited. This is a well known difficulty associated with the expectation maximization algorithm. We use an information geometric analysis of the expectation maximization algorithm as an alternating min...

متن کامل

Assessing the Translation of Parvin Etesami's Selected Poems Using Vinay and Darbelnet’s Model

Translators always seek to find the best equivalents for each word, sentence or phrase in the target language (TL) in order to have the most accurate and meaningful translation of the text. Generally, a translator’s main concern is whether to prefer the form over the content or vice versa. In translation studies, literal translation prioritizes the form while free translation concentrates on th...

متن کامل

Combining a mixture language model and Naive Bayes for multi-document summarisation

The TNO system for multi-document summarisation is based on an extraction approach. We combined two statistical methods for sentence selection with a variant of the MMR algorithm. After sentence segmentation, each sentence is scored on the basis of two probabilistic models. The first model scores sentences based on a (generative) unigram language model, which is a mixture of a cluster model, a ...

متن کامل

A Supplier Selection Model for Social Responsible Supply Chain

Due to the importance of supplier selection issue in supply chain management (SCM) and ,also,  the increasing tendency of organizations to their social responsibilities, In this paper, we survey the supplier selection issue as a multi objective problem while considering the factor of corporate social responsibility (CSR) as a mathematical parameter. The purpose of this paper is to design a mode...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011